Goto

Collaborating Authors

 small-scale model


ToolExpander: Extending the Frontiers of Tool-Using Reinforcement Learning to Weak LLMs

Chen, Fu, Wang, Peng, Li, Xiyin, Li, Wen, Lei, Shichi, Xiang, Dongdong

arXiv.org Artificial Intelligence

Training Large Language Models (LLMs) with Group Relative Policy Optimization (GRPO) encounters a significant challenge: models often fail to produce accurate responses, particularly in small-scale architectures. This limitation not only diminishes performance improvements and undermines the potential of GRPO but also frequently leads to mid-training collapse, adversely affecting stability and final efficacy. To address these issues, we propose ToolExpander, a novel framework that advances tool-oriented reinforcement learning for resource-constrained LLMs through two key innovations:(1) Dynamic Multi-Round Hard Sampling, which dynamically substitutes challenging samples(those without correct outputs over 10 rollouts) with high-quality few-shot demonstrations during training, coupled with an exponential learning rate decay strategy to mitigate oscillations;(2) Self-Exemplifying Thinking, an enhanced GRPO framework that eliminates KL divergence and incorporates adjusted clipping coefficients, encouraging models to autonomously generate and analyze few-shot examples via a minimal additional reward (0.01).Experimental results demonstrate that ToolExpander significantly enhances tool-using capabilities in LLMs, especially in weaker small-scale models, improving both training stability and overall performance.


Distilling Calibration via Conformalized Credal Inference

Huang, Jiayi, Park, Sangwoo, Paoletti, Nicola, Simeone, Osvaldo

arXiv.org Artificial Intelligence

Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.


Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression

Liu, Jiduan, Liu, Jiahao, Wang, Qifan, Wang, Jingang, Cai, Xunliang, Zhao, Dongyan, Wang, Ran Lucien, Yan, Rui

arXiv.org Artificial Intelligence

Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. While numerous model compression techniques have been proposed, most of them are not well-suited for achieving extreme model compression when there is a significant gap in model scale. In this paper, we introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT), which effectively transfers the knowledge of LLMs to extremely small-scale models (e.g., 1%). In particular, our approach extracts knowledge from LLMs to construct a knowledge store, from which the small-scale model can retrieve relevant information and leverage it for effective inference. To improve the quality of the model, soft prompt tuning and Proximal Policy Optimization (PPO) reinforcement learning techniques are employed. Extensive experiments are conducted on low-resource tasks from SuperGLUE and GLUE benchmarks. The results demonstrate that the proposed approach significantly enhances the performance of small-scale models by leveraging the knowledge from LLMs.


Artificial intelligence to influence top tech trends in major way in next five years

#artificialintelligence

Artificial intelligence will be the common theme in the top 10 technology trends in the next few years, and these are expected to quicken breakthroughs across key economic sectors and society, the Alibaba Damo Academy says. The global research arm of Chinese technology major Alibaba Group says innovation will be extended from the physical world to a mixed reality, as more innovation finds its way to industrial applications and digital technology drives a green and sustainable future. "Digital technologies are growing faster than ever," Jeff Zhang, president of Alibaba Cloud Intelligence and head of Alibaba Damo, said in a report released on Monday. "The advancements in digitisation, 'internetisation' and intelligence are redefining a digital world that is characterised by the prevalence of mixed reality. "Digital technology plays an important role in powering a green and sustainable future, whether it is applied in industries such as green data centres and energy-efficient manufacturing, or in day-to-day activities like paperless office."